We  describe  an  algorithm  for  the  rapid  direct  solution  of  linear  algebraic  systems  arising  from  the 
discretization  of  boundary  integral  equations  of  potential  theory  in  two  dimensions.  The  algorithm  is 
combined  with  a  scheme  that  adaptively  rearranges  the  parameterization  of  the  boundary  in  order  to 
minimize  the  ranks  of  the  off-diagonal  blocks  in  the  discretized  operator,  thus  obviating  the  need  for 
the  user  to  supply  a  parameterization  r  of  the  boundary  for  which  the  distance  ||r(s)  —  r(f)||  between 
two  points  on  the  boundary  is  related  to  their  corresponding  distance  |r  —  s|  in  the  parameter  space. 
The  algorithm  has  an  asymptotic  complexity  of  0(nlog2n),  where  n  is  the  number  of  nodes  in  the 
discretization.  The  performance  of  the  algorithm  is  illustrated  with  several  numerical  examples. 
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1  Introduction 


Integral  equations  are  one  of  principal  tools  in  the  analysis  and  solution  of  boundary  value  problems 
for  elliptic  partial  differential  equations.  In  particular,  one  of  the  standard  approaches  to  the  numerical 
treatment  of  boundary  value  problems  for  elliptic  partial  differential  equations  calls  for  reformulating 
them  as  boundary  integral  equations,  discretizing  the  associated  integral  operators,  and  solving  the 
resulting  linear  systems  in  order  to  obtain  solutions  in  the  form  of  layer  potentials.  This  approach 
has  been  widely  studied,  especially  in  the  context  of  Laplace  and  Helmholtz  boundary  value  problems 
on  smooth  domains.  Traditionally,  an  iterative  solver  was  coupled  with  the  appropriate  fast  multipole 
method  (for  example,  [10])  in  order  to  solve  the  discrete  linear  systems  arising  from  the  boundary 
integral  equations.  For  problems  associated  with  the  Laplace  and  Helmholtz  equations,  the  asymptotic 
complexity  of  this  approach  is  0(n)  and  0(n  log  n)  respectively,  with  n  the  number  of  nodes  in  the 
discretization. 

More  recently,  a  number  of  “fast  direct  solvers”  was  developed  for  the  solution  of  linear  systems 
arising  in  various  environments  (see,  for  example,  [4,  5,  19,  8]).  Most  of  these  schemes  are  based  on 
the  observation  that  the  matrices  in  question  have  a  hierarchical  structure  involving  rank-deficient  off- 
diagonal  blocks.  Matrices  with  such  structure  commonly  arise  from  the  boundary  integral  operators  of 
potential  theory  and,  in  particular,  the  authors  in  [19]  describe  a  fast  direct  solver  for  boundary  integral 
equations  in  two  dimensions  that  make  use  of  such  structure.  Hierarchically  rank-deficient  matrices  have 
been  studied  in  a  number  of  other  contexts  (see,  for  instance,  [12,  13,  3]),  and  they  are  strongly  related  to 
the  class  of  “generalized”  Calderon-Zygmund  operators  characterized  by  having  integral  kernels  K(x,y) 
which  are  smooth  for  x  well-separated  from  y  (see  [6]). 

The  solver  in  [19]  operates  by  constructing  a  two-sided  hierarchical  factorization  of  the  inverse  of 
a  discretized  boundary  integral  operator.  When  applied  to  boundary  integral  equations  of  potential 
theory  in  two  dimensions,  the  solver  has  an  asymptotic  complexity  of  O(n),  with  n  the  number  of  nodes 
used  to  discretize  the  integral  equation.  The  two-sided  factorization,  which  appears  to  be  necessary 
in  order  to  obtain  an  algorithm  that  is  asymptotically  0(n),  is  relatively  complicated  and  makes  the 
algorithm  difficult  to  implement.  Moreover,  the  solver  in  [19]  relies  on  certain  strong  assumptions  about 
the  regions  on  which  the  PDEs  are  to  be  solved.  In  particular,  it  assumes  that  the  boundary  of  the 
region  is  specified  via  a  parameterization  r  :  [0, 1] R2  such  that  the  distance  ||r(s)  -  r(t)||  between 
two  points  on  the  curve  is  related  to  their  corresponding  distance  |r  -  s|  in  the  parameter  space.  For 
certain  complicated  curves  such  parameterizations  can  be  difficult  to  obtain,  and  in  the  absence  of  a 
parameterization  with  this  property  the  resulting  discretized  operator  can  exhibit  high  rank  off-diagonal 
blocks.  This  becomes  an  even  more  serious  problem  for  boundary  integral  equations  in  three-dimensions 
since  for  a  parameterization  r  of  a  boundary  surface  ||r(si,<i)  —  r(s2,£2)||  generally  bears  no  relation  to 
the  Euclidean  distance  between  (si,£i)  and  (s2,t2). 

In  this  paper,  we  introduce  a  simple  direct  solver  that  is  similar  to  the  one  in  [19],  but  operates 
by  constructing  a  one-sided  hierarchical  factorization  of  the  inverse  of  a  matrix.  When  applied  to  a 
matrix  with  rank  deficient  off-diagonal  blocks  and  no  other  structure,  the  solver  is  asymptotically  0(n2) 
in  the  dimension  n  of  the  matrix.  When  applied  to  boundary  integral  equations  in  two  dimensions 
arising  from  partial  differential  equations  for  which  a  Green’s  function  is  non-oscillatory  (or,  weakly 
oscillatory,  as  defined  in  [19]),  the  complexity  of  the  solver  is  reduced  to  0(n log2  n).  Not  only  is  this 
solver  considerably  simpler  to  implement  than  that  of  [19],  but  it  also  addresses  an  important  weakness 
of  that  solver.  That  is,  it  includes  an  adaptive  rotation  scheme  that  rearranges  the  parameterization  of 
the  boundary  in  order  to  minimize  the  ranks  of  the  off-diagonal  blocks.  This  scheme  obviates  the  need 
for  the  user  to  supply  a  parameterization  r  of  the  boundary  curve  for  which  ||r(t)  -  r(s)||  is  related  to 
|£  —  s|  and  thereby  expands  the  class  of  regions  to  which  the  solver  is  applicable.  Moreover,  the  scheme 
generalizes  readily  to  the  three  dimensional  setting,  where  it  is  expected  to  be  a  useful  tool. 

The  paper  is  structured  as  follows.  Section  2  provides  the  necessary  mathematical  and  numerical 
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preliminaries.  Section  3  reviews  solution  of  boundary  value  problems  for  Laplace’s  equation  via  bound¬ 
ary  integral  equations.  In  Section  4,  we  introduce  the  multi-level  algorithm  for  the  construction  of  a 
compressed  factorization  of  the  inverse  of  a  matrix;  the  algorithm  applies  generally  to  matrices  with 
rank  deficient  off-diagonal  blocks.  Section  5  contains  a  formal  description  of  the  algorithm  outlined 
in  Section  4  and  assesses  its  computational  cost.  In  Section  6,  we  show  how  the  algorithm  presented 
in  Section  5  can  be  accelerated  when  the  matrix  arises  from  the  discretization  of  a  boundary  integral 
operator.  Section  7  illustrates  through  numerical  examples  the  performance  of  the  algorithm  when  ap¬ 
plied  to  boundary  integral  equations.  Finally,  in  Section  8,  we  summarize  the  work  and  discuss  possible 
extensions  and  generalizations. 

2  Mathematical  and  numerical  preliminaries 

Throughout  the  paper  we  use  the  following  notation.  Given  a  matrix  X,  we  let  X *  denote  its  adjoint  (the 
complex  conjugate  transpose),  ffk(X)  its  fcth  singular  value,  J| JV |j 2  its  I2-norm,  and  ||X|)^  its  Frobenius 
norm.  Finally,  given  matrices  A,  B ,  C,  and  D ,  we  let 

(A  B  ),  ^  c  j  ,  and  ^  c  D  ^  (2.1) 

denote  larger  matrices  obtained  by  combining  the  blocks  A,  B,  C,  and  D. 

2.1  Singular  value  decomposition 

The  singular  value  decomposition  (SVD)  is  a  ubiquitous  tool  in  numerical  analysis,  provided  in  the  case 
of  real  matrices  by  the  following  lemma  (see,  for  instance,  [21]  for  more  details). 

Lemma  2.1  (SVD)  For  any  nxm  real  matrix  A,  there  exist,  for  some  integer  p,  annxp  real  matrix  U 
with  orthonormal  columns,  anmxp  real  matrix  V  with  orthonormal  columns,  and  a  pxp  real  diagonal 
matrix  E  with  positive  diagonal  entries  ui  >  02  >  •  •  ■  >  crp  >  0,  such  that  A  —  U 12V* . 

The  diagonal  entries  of  E  are  called  singular  values,  the  columns  ut  of  the  matrix  U  are  called 
the  left  singular  vectors,  and  the  columns  V{  of  the  matrix  V  are  called  the  right  singular  vectors.  The 
number  p  is  called  the  (mathematical)  rank  of  A.  Note  that  the  SVD  of  A  can  be  written  as 

v 

A  =  UEV*  =  (2.2) 

1  =  1 

where 

U  Up]  and  V  =  [vx, . . .  ,vv\\  (2.3) 

obviously,  (2.2)  provides  a  decomposition  of  the  matrix  A  into  a  sum  of  p  rank  one  matrices. 

A  common  application  of  the  SVD  is  for  the  approximations  of  matrices,  as  described  by  the  following 
lemma  (see,  for  instance,  [1]). 

Lemma  2.2  Suppose  A  £  BtnXm  has  the  SVD 

p 

A  =  (7EV*  =  (2.4) 

t  =  l 

and  the  matrix  B  e  RnXm  is  defined  by  the  formula 

k 

B  =  ^OjUiV*,  (2.5) 

i=l 
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where  k  is  an  integer  with  1  <  k  <  p.  Then 

min  ||A  -  Ak\\2  =  ||A  -  B||2  =  ffjt+i,  (2-6) 

where  Ak  ranges  over  the  set  of  all  n  x  m  matrices  of  rank  k.  In  other  words,  B  is  the  best  rank  k 
approximation  of  A. 

The  SVD  allows  us  to  introduce  the  concept  of  the  numerical  rank  of  a  matrix.  For  some  small  e, 
we  define  the  e-rank  of  a  matrix  A  via  the  formula 

rank(A.e)  =  min  rank(B)  (2-7) 

(see  [9]).  In  other  words,  rank(A,e)  equals  k  if  and  only  if  there  are  exactly  k  singular  values  of  A  that 
lie  above  e,  i.e., 

<?k  >  £  >  ffjt+i,  (2-8) 


with  <7p+i  defined  to  be  0. 


2.2  QR  decomposition 

The  singular  value  decomposition  provides  the  optimal  rank  k  approximation  to  a  given  matrix,  how¬ 
ever,  the  SVD  is  relatively  expensive  to  construct,  and  other,  less  computationally  expensive,  matrix 
factorizations  are  often  used. 

Given  a  real  m  x  n  matrix  M,  l  —  min(m,n),  and  an  integer  k  with  0  <  k  <  l,  the  classical  QR 
decomposition  (see,  for  instance,  [9])  constructs  a  factorization  of  the  form 

MU  =  QR  =  Q  (  ^  £*),  (2-9) 


where  II  6  Rnxn  is  a  permutation  matrix,  Q  €  Rmxi  has  orthonormal  columns,  R  6  Rixn,  Ak  6  M.kxk 
is  upper  triangular  with  nonnegative  diagonal  entries,  Bk  €  R* x  k\  and  Ck  £  R^  9  (  ■* . 

The  following  theorem  can  be  found,  in  a  slightly  different  form,  in  [11].  It  asserts  that,  given  any 
real  mxn  matrix  M,  there  exists  a  factorization  of  the  form  (2.9)  satisfying  inequalities  such  that  (2.9) 
provides  a  reasonable  means  to  detect  the  numerical  rank  of  M. 

Theorem  2.3  (Gu  and  Eisenstat)  Suppose  that  M  is  a  real  mxn  matrix,  l  =  min(m,n),  and  M  has 
p  singular  values.  Then  for  any  integer  k  with  0  <  k  ^  p,  there  exists  a  factorization  of  the  form.  (2.9) 
such  that 


o-k(Ak) 

> 

/- - r. - TTff*(M)> 

\f 1  +  k(n  —  k) 

l|C*||2 

< 

\Jl  +  k{n  -  k)ak+i{M), 

< 

y/k(n  —  k). 

(2.10) 

(2.11) 

(2.12) 


Theorem  2.3  implies  that  if  M  6  Rmxn  is  a  matrix  of  numerical  rank  k  to  precision  e,  there  exists 
a  permutation  matrix  II  £  Rnxn  such  that  the  first  k  columns  of  MU  form  a  well-conditioned  basis  for 
the  column  space  of  M ,  to  within  e.  Let  j  i ,  J2 ,  •  ■  ■  ,jk  be  the  column  indices  of  M  corresponding  to  the 
first  k  columns  of  MU;  then  we  will  refer  to  the  m  x  k  matrix  consisting  of  the  columns  of  M  numbered 
ji,j2,  ...,jk  as  a  column  skeleton  of  M.  Furthermore,  in  this  case  the  inequality  (2.11)  implies  that  M 
can  be  accurately  approximated  by  a  matrix  of  rank  k.  Specifically, 


M^QS, 


(2.13) 
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where  Q  is  the  mxk  matrix  formed  by  the  first  k  columns  of  Q  in  (2.9)  and  S  is  a  k  x  n  matrix  defined 
by  the  formula 

S={Ak  Bk  )n\  (2.14) 

and 

\\M  -  QS\\2  <  y/l  +  k{n  -  k)e.  (2.15) 

Remark  2.1  While  Theorem  2.3  asserts  the  existence  of  a  QR  decomposition  of  the  form  (2.9)  satisfying 
(2.10)-(2.12),  it  does  not  address  the  question  of  how  to  construct  it  numerically.  In  [11],  a  robust, 
provably  stable  algorithm  is  presented  that  constructs  a  QR  decomposition  of  the  form  (2.9),  with  (2.10)- 
(2.12)  replaced  by  the  weaker  inequalities: 


Ok  ( Ak ) 

> 

1  rrJM), 

\J  1  +  nk(n  —  k) 

(2.16) 

\\Ck\\2 

< 

\/l  +nk[n-  k)ok+i{M), 

(2.17) 

< 

yjnk[n-k). 

(2.18) 

In  this  paper,  we  use  the  pivoted  Gram-Schmidt  algorithm  (with  reorthogonalization)  described  in  [ 1 ]  to 
construct  factorizations  of  the  form  (2.13).  While  there  are  no  guaranteed  bounds  of  the  form  (2.15)  for 
this  algorithm,  it  is  simple  to  implement  and  does  well  in  practice.  In  particular,  one  applies  the  pivoted 
Gram-Schmidt  algorithm  to  the  columns  of  M ,  halting  the  procedure  when  the  l2 -norm  of  the  remaining 
columns  falls  below  a  preset  threshold  e.  This  procedure  computes  a  column  skeleton  for  M,  and  if  M  is 
an  m  x  n  matrix  with  numerical  rank  k,  it  requires  0(mnk)  operations.  After  that,  a  factorization  of 
the  form  (2.13)  is  computed  in  0(k2{m  +  n))  operations. 

2.3  Randomized  algorithms  for  the  approximation  of  matrices 

In  [23],  a  randomized  algorithm  is  presented  that  constructs  a  low-rank  approximation  to  a  matrix  A  in 
the  form  A  &  AP,  where  A  is  a  column  skeleton  of  A. 

Suppose  A  is  an  rax  n  matrix,  and  l  and  k  are  positive  integers  with  k  <  l  <  min(m,  n).  The  algorithm 
of  [23]  involves  applying  an  lx  n  random  matrix  $  to  A,  and  then  constructing  a  decomposition  of  the 
form 

«  BP,  (2.19) 

where  B  is  a  column  skeleton  of  4 >j4 ,  consisting  of  k  columns  of  QA  with  indices  •  ■  ■  >jk  (f°r  details 
on  the  exact  form  of  4?,  see  [23]).  If  we  let  A  be  the  m  x  k  matrix  formed  by  collecting  the  k  columns 
of  A  with  the  same  indices,  then  the  product  AP  provides  an  approximation  to  A  such  that 

||yl  -  AP\\2  <  Vmnk<Jk+i{A),  (2.20) 

with  very  high  probability.  The  probability  p  of  (2.20)  is  a  function  of  l  —  k\  its  actual  estimates  are 
detailed,  and  the  reader  is  referred  to  [23]  for  them.  Here  we  merely  observe  that  (for  example)  l-k  =  20 
yields  p  >  1  —  10-17. 

The  randomized  approach  of  [23]  accelerates  the  process  of  finding  a  column  skeleton  of  a  matrix  for 
the  purpose  of  constructing  its  QR  decomposition.  In  order  to  find  a  column  skeleton  of  an  mxn  matrix 
A  with  numerical  rank  k,  one  can  first  form  the  l  x  n  matrix  QA,  which  can  be  done  in  O {rnn  log  l) 
operations  (see  [23]),  and  then  apply  the  pivoted  Gram-Schmidt  algorithm  to  <PA.  The  procedure  involves 
a  total  cost  of  0(mn\ogl  +  Ink),  and  this  is  less  expensive  than  applying  the  pivoted  Gram-Schmidt 
algorithm  directly  to  A,  which  has  a  cost  of  0(mnk). 
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2.4  Sherman-Morrison- Woodbury  formula 

The  Sherman-Morrison- Woodbury  formula  provides  an  expression  for  the  inverse  of  a  low-rank  pertur¬ 
bation  of  an  invertible  matrix.  It  can  be  found,  for  example,  in  [9]. 

Lemma  2.4  Suppose  that  A  is  an  invertible  n  x  n  matrix,  and  that  U  and  V  are  n  x  k  matrices.  Then 
{A  +  UV*)~1  =  A-1  -  A~1U{I  +  V* A~lU)~lV* A~l ,  (2.21) 

assuming  that  the  matrix  (I  +  V* A~lU)  is  invertible. 

The  Sherman-Morrison- Woodbury  formula  implies  that  a  rank  k  perturbation  to  a  matrix  results  in 
a  rank  k  perturbation  to  the  inverse.  In  the  case  that  A  is  an  n  x  n  identity  matrix  I,  (2.21)  reduces  to 
the  following  convenient  form: 

(7  +  UV*)-1  =  1  -U(I  +  V*U)~1V\  (2.22) 

Note  that  the  second  I  appearing  in  the  right-hand  side  of  (2.22)  is  a  k  x  k  identity  matrix. 


3  Boundary  integral  formulations 

In  this  section,  we  briefly  outline  the  solution  of  certain  boundary  value  problems  for  Laplace’s  equation 
via  integral  equation  methods.  Thorough  treatment  of  the  classical  theory  can  be  found  in  [17,  20,  7,  15]. 
Extension  of  the  classical  theory  to  the  case  of  Lipschitz  domains  is  discussed  in  [16,  22,  6]. 

Throughout  this  section,  Cl  will  denote  a  bounded,  smooth,  simply  connected  domain  in  the  plane 
with  boundary  dCl,  Clc  will  denote  the  open  region  in  the  plane  exterior  to  fi,  and  dS  will  denote 
integration  with  respect  to  the  arclength  measure  on  dCl. 

3.1  Interior  Dirichlet  problem 

The  interior  Dirichlet  problem  calls  for  the  determination  of  a  function  harmonic  in  Cl  with  prescribed 
values  on  the  boundary  dCl.  That  is,  given  a  continuous  /  :  dCl  — >  R,  we  seek  a  function  u  :  fi  — >  R.  such 
that 

A u(x)  —  0  for  x  e  n, 

lim  u(x)  =  f(p)  for  p  6  dCl.  (3-1 ) 

X  >p 

xGH 

As  is  well-known,  such  a  problem  has  a  unique  solution  that  can  be  represented  as  the  potential  of  a 
dipole  distribution  o  on  dCl: 

u{x)  =  ^-[  o{y)7r-tog\x-y\dS{y),  (3.2) 

Jdn  °vy 

where  denotes  the  outward  normal  derivative  taken  at  the  point  y.  In  particular,  the  function  u(x) 
defined  by  (3.2)  is  harmonic  in  Cl  and  the  limit  of  u(x)  as  x  approaches  the  point  p  £  dCl  from  the 
interior  of  f2  is  given  by  the  jump  relation 

lim  u(x)  =  ^cr(p)  +  o{y)J— \°g  \p~y\dS{y).  (3.3) 

xe£  1  Z7r  Jd n  ai/y 

Thus,  if  <7  satisfies  the  integral  equation 

\°{p)  +  7T  j  0(y)-Jr-\°g\p-y\dS{y)  =  f{p)  (3.4) 

2  27t  Jm  duy 

for  all  p  £  dCl,  then  the  function  u(x)  given  by  (3.2)  is  a  solution  to  problem  (3.1). 
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3.2  Exterior  Dirichlet  problem 

The  exterior  Dirichlet  problem,  which  consists  of  finding  a  function  u  :  Qc  — »  R  such  that 

Au(x )  =  0  for  x  €  fic, 
lun  u{x)  =  /(p)  for  p  €  80,, 


has  a  unique  solution  under  the  additional  assumption  that  u  behaves  as  0(1)  at  infinity. 

A  difficulty  arises,  however,  in  applying  the  approach  of  the  preceding  section.  Namely,  the  integral 
equation  resulting  from  the  jump  relation  for  the  exterior  domain  is  not  uniquely  solvable.  In  particular, 
if  we  represent  the  solution  u  of  (3.5)  in  the  form  (3.2): 

u(x)  =  r-  /  o(y)jr-\°g\x~  y\dS{y),  (3.6) 

2?r  Jdn  ovy 

then  the  limit  of  u(x)  as  x  goes  to  p  G  dQ  from  Q.c  is 

lim  u(x)  =  —  ^cr(p)  +  -—  /  o{y)-^—  log  |p  -  y\dS(y).  (3.7) 

2  27t  Jan  dvy 

This  leads  to  the  integral  equation 

-l°(p)  +  ^~  f  °(y)-Jr-^°&\p-y\dS(y)  =  /(?)•  (3-8) 

*  Zk  JdCi  °vy 

The  operator  appearing  on  the  left-hand  side  of  equation  (3.8)  has  a  one-dimensional  null  space;  more 
specifically, 

-\  +  f  -S-  log  |p  -  y\dS{y)  =  0,  for  all  p  6  dQ.  (3.9) 

2  2tt  Jan  duy 

Note  that  the  exterior  Dirichlet  problem  itself  has  a  unique  solution  u,  but  the  corresponding  dipole 
distribution  a  in  representation  (3.6)  is  only  determined  up  to  a  constant  because 


—  log  |p  -  y\ dS(y)  =  0,  for  p  €  fic. 

ovx , 


(3.10) 


To  overcome  this  difficulty,  we  use  the  modified  double-layer  potential  to  represent  the  solution  u: 


i{x) 


log|x  -  y\  +  1 )  dS{y). 


(3.11) 


This  leads  to  the  integral  equation 

+  CT(y) + *) dS^  =  ^(p)'  (3-12) 

It  is  shown  in  [17]  that  (3.12)  has  a  unique  solution  a  which,  when  inserted  into  (3.11),  produces  the 
(unique)  solution  of  (3.5). 
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3.3  Exterior  Neumann  problem 

The  exterior  Neumann  problem 


Au(x)  —  0  for  x  e  ftc, 

diL 

lim  — — ( x )  =  /(p)  for  p  e  d ft 
x~*p  dvx 

xenc  x 


(3.13) 


is  solvable,  provided  that 


/  . 


f(p)dS(p)  =  0. 


(3.14) 


It  admits  a  unique  solution  under  the  additional  assumption  that  u  goes  to  0  at  infinity.  The  solution 
can  be  represented  in  the  form  of  a  single  layer  potential  arising  from  a  charge  distribution  er  on  3ft: 


u(x)  =  ir[  v(y)\og\x  ~ y\dS(y). 
2^  Jan 


The  proper  charge  distribution  a  is  obtained  by  solving  the  boundary  integral  equation 


-Mp) 


1 

27T  , 


/  <T(y)^-\og\p-y\dS(y)  =  f(p), 
Jan  ol/p 


(3.15) 


(3.16) 


which  is  derived  by  taking  the  derivative  in  the  variable  x  of  both  sides  of  (3.15)  with  respect  to  the 
outward  normal  vector,  taking  the  limit  as  x  goes  to  p  6  9ft  from  the  exterior  of  ft,  and  applying  the 
appropriate  jump  relation.  As  in  the  case  of  the  interior  Dirichlet  problem,  the  integral  equation  (3.16) 
is  uniquely  solvable. 


3.4  Interior  Neumann  problem 

Similarly,  the  interior  Neumann  problem 

Au(x)  =  0  for  x  6  ft, 
du 

lim  — — ( x )  =  f(p)  for  p  6  9ft 
x~,p  avx 
xen  x 


is  uniquely  solvable  (up  to  a  constant),  provided 

/  f(p)dS(p)  =  0.  (3.18) 

Jdn 

Representing  the  solution  u  as  a  single  layer  potential 

u{x)  =  J-  f  a(y)  log  |x  —  y\dS(y)  (3.19) 

2?r  Jan 

leads  to  the  integral  equation 

-^(p)  +  7-/  v{y)-Jr-log\p-y\dS(y)  =  f(p).  (3.20) 

2  2tt  Jan  ovv 

As  in  the  case  of  the  exterior  Dirichlet  problem,  the  operator  appearing  on  the  left-hand  side  of  equation 
(3.20)  has  a  one-dimensional  null  space.  To  overcome  this,  we  consider  instead  the  integral  equation 

~\a^  +  ^Jg  a(v)  (^-lo&\p-y\  +  1'j  dS(v)  =  {OT p e  d^’  (3-21) 

which  has  a  unique  solution  a  and,  when  inserted  into  (3.19),  produces  a  solution  of  (3.17). 
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4  Numerical  apparatus 

In  this  section,  we  describe  a  scheme  for  constructing  a  factorization  of  the  inverse  of  any  matrix  that 
possesses  a  hierarchical  structure  involving  low-rank  off-diagonal  blocks.  We  first  present  a  one-level 
scheme  in  Section  4.1,  and  then  we  discuss  how  it  can  be  applied  recursively  to  obtain  a  multi-level 
scheme  in  Section  4.2.  In  particular,  the  scheme  we  describe  is  applicable  to  matrices  resulting  from  the 
discretization  of  boundary  integral  equations  of  potential  theory. 


4.1  One-level  compression  scheme 

Consider  a  matrix  A  written  in  2  x  2  block  form: 


A  = 


A  Oi 

02  A 


£  ]ft/n+m)  x  (n+m) 


(4.1) 


where  the  off-diagonal  blocks  Ox  €  Rnxm  and  02  6  ffi.mxn  are  of  numerical  rank  k  <  min(m,n) 
(to  precision  e),  and  the  diagonal  blocks  A  6  Rnxn  and  A  €  K.mxm  are  invertible.  A  compressed 
factorization  of  A-1  can  be  obtained  by  transforming  A  into  a  simpler  form,  and  then  applying  the 
Sherman-Morrison- Woodbury  formula. 

First,  we  construct  for  0\  and  02  factorizations  of  the  form  (2.13): 


Oi  =  QiSi  +  O(e)  (4.2) 

A  =  Q2S2  +  0(e),  (4-3) 

where  Q 1  €  RnxA:,  Q2  €  Rmxfc,  Si  €  Rfcxm,  S2  6  Rfcxn,  and  Qx,  Q2  have  orthonormal  columns.  For 

simplicity,  we  will  henceforth  assume  that  the  off-diagonal  blocks  have  exact  rank  k  and  ignore  the  error 

terms. 

Now  if  one  applies  the  matrix 

'  A1  0 
0  A1 


B  = 


(4.4) 


to  A  from  the  left,  one  obtains  a  matrix  of  the  form 

A  = 


I  UiSi 

U2S2  I 


where  U\  =  A  lQ\  and  U2  =  D2  'A-  Expressing  A  as 

A  =  /  +  UV\ 

where 


and 


U  : 


V*  = 


A  0 
0  A 

0  sx 
s2  0 


we  now  use  formula  (2.22)  to  represent  the  inverse  of  A  as: 

A-1  =  I  -  UCV*, 


(4.5) 

(4.6) 

(4.7) 

(4.8) 

(4.9) 


where 

C  =  (/  +  VrU)~1. 


(4.10) 
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(4.11) 


Thus,  we  have  obtained  a  factorization  of  the  inverse  of  A  in  the  form: 

A-1  =  (7  —  UCV*)B. 

Note  that  U  is  (n+m)  x  2k,  V  is  2k  x  (n  +  m),  and  C  is  2k  x  2k.  As  long  as  k  is  small,  the  factorizations 
(4.2),  (4.3),  and  the  C  matrix  (4.10)  can  be  computed  rapidly,  and  the  inverse  of  A  can  be  applied 
rapidly  to  any  vector  or  matrix  by  a  scheme  based  on  (4.11). 

Remark  4.1  In  the  above  we  assume  that  Oi  and  02  have  the  same  rank.  This  assumption  was  made 
for  notational  convenience  and  is  in  no  way  essential  to  the  results. 


4.2  Multi-level  compression  scheme 

The  one-level  compression  scheme  in  Section  4.1  can  be  applied  recursively  to  obtain  a  multi-level 
compression  scheme.  In  this  subsection,  we  construct  a  two-level  scheme;  a  formal  description  of  the 
multi-level  scheme  can  be  found  in  Section  5. 

Consider  a  matrix  A  with  a  two-level  block  structure: 


A  = 


02,1 

02,2 

O 


02,  i 

02,2 

1,2 


O  l 

02,3 

02,4 


02,3 

02,4 


with  the  two  diagonal  blocks  on  the  first  level  defined  by  the  formula 

0i,i  = 

01,2  = 


02,1 

02,1 

02,2 

02,2 

02,3 

02,3 

02,4 

02,4 

II 

f— ‘ 

,2,.. 

(4.12) 


(4.13) 


(4.14) 


We  assume  that  all  diagonal  blocks  Ditj,  i  =  1,2;  j  =  1,2, . . .  ,2i,  are  invertible  square  matrices,  and  all 
off-diagonal  blocks  Oij,  i  =  1,2;  j  -  1,2, . . .  ,2i,  have  numerical  ranks  at  most  k.  For  convenience,  let 
us  denote  the  matrix  ’(/  -  UCV *)  in  (4.11)  by  X,  so  the  one-level  compression  formula  (4.11)  can  be 
rewritten  as 

A~l=XB.  (4.15) 


The  two-level  compression  scheme  is  carried  out  in  a  bottom-up  manner.  First,  we  apply  the  one-level 
compression  scheme  in  Section  4.1  to  the  diagonal  blocks  0i,i  and  0i,2,  obtaining  factorizations  of  their 
inverses: 

D^\=XhlBltl  (4.16) 

Dfl  =  Xh2Bli2.  (4.17) 

That  is,  are  obtained  by  replacing  the  role  of  A  in  (4.1)  by  that  of  0i,i  in  (4.13).  Aii2,0i,2 

are  similarly  obtained. 

Then,  we  apply  the  one-level  compression  scheme  to  the  first-level  block  structure  of  A: 


0  i,i  Oi.i 
01,2  01,2 


(4.18) 


making  use  of  the  fact  that  compressed  factorizations  of  the  inverses  of  0i,i  and  0 1,2  are  already  stored. 
We  thus  obtain  a  compressed  factorization  of  A-1  as 


B,  = 


(4.20) 


where  B2  has  the  block  form 

'  XhlBhl  0 

o  X\^B\,2  J 

As  long  as  k  is  small,  the  compressed  factorizations  of  D and  A-1  can  be  computed  rapidly. 
After  that,  we  can  apply  A-1  to  any  vector  by  a  recursive  scheme  based  on  equations  (4.19)  and  (4.20). 
Note  that  the  recursive  scheme  constructs  an  one-sided  factorization  of  A-1  of  the  form 


A~1=X2X1BU 


(4.21) 


where 


Am  0 

0  a12 


and  Bi  = 


B  1,1  0 

0  B  ij2 


(4.22) 


A  multi-level  compression  scheme  can  be  similarly  obtained  by  applying  the  one-level  compression 
scheme  recursively.  If  A  has  an  n-level  block  structure,  then  the  scheme  constructs  a  one-sided  factor¬ 
ization  of  A-1  of  the  form 

A-1  =  XnXn-i .  ..XiBi,  (4.23) 


where  X\,.  ■ . ,  Xn-i  and  B\  are  block-diagonal  matrices.  This  is  in  contrast  to  the  scheme  in  [19],  which 
constructs  instead  a  two-sided  factorization  of  the  inverse  of  a  matrix.  Section  5  describes  the  numerical 
algorithm  in  detail. 


Remark  4.2  We  only  need  to  compute  and  store  the  matrices  that  are  needed  in  applying  the  inverse 
of  A  to  another  vector  or  matrix.  For  example,  in  the  two-level  scheme  above,  the  matrices  B\t\, 

B2,  Ax,i,  Xx:2,  and  X2  need  not  be  explicitly  formed. 

Remark  4.3  The  multi-level  compression  procedure  is  particularly  applicable  to  matrices  that  arise  from 
the  discretization  of  boundary  integral  operators  of  potential  theory.  Suppose  we  have  the  boundary 
integral  operator 

\cr(x)  +  J  K(x,y)a(y)dS(y),  for  i6l,  (4.24) 

where  T  denotes  the  boundary  of  a  region  LI,  and  K(x,y )  is  the  kernel  of  the  single  or  double  layer 
potential  for  Laplace’s  equation.  Let  A  €  Rnxn  be  a  matrix  obtained  from  discretizing  (4M)  using  n 
nodes  on  T,  and  let  r0  be  a  segment  of  T  that  is  discretized  with  n0  nodes.  Then  for  most  regions  LI 
encountered  in  practice,  the  rank  of  interaction  k  between  the  nodes  on  Ty  and  those  on  the  rest  of  the 
boundary  is  bounded  by  the  logarithm  of  no-  Specifically, 


k  <  clog(n0), 


(4.25) 


where  c  is  a  constant  independent  of  no  and  n  (see,  for  example,  [19]).  The  bound  (4-25)  implies  that  in 
such  case  A  admits  a  hierarchical  block  structure  in  which  the  off-diagonal  blocks  are  of  low  numerical 
rank. 

Remark  4.4  Once  a  multi-level  compression  procedure  is  performed  on  A,  one  can  rapidly  apply  A  1 
to  any  vector  in  a  recursive  manner.  Typically,  if  A  is  an  n  x  n  matrix  arising  from  the  discretization 
of  a  boundary  integral  operator,  it  takes  O(nlogn)  operations  to  apply  its  inverse  to  a  vector. 


5  Algorithm  and  performance 

Section  4  contains  an  informal  description  of  a  hierarchical  compression  scheme  for  matrices  with  low- 
rank  off-diagonal  blocks.  In  this  section,  we  give  an  algorithmic  description  of  such  a  scheme,  and 
estimate  its  efficiency. 
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5.1  One-level  compression 

In  the  following,  we  consider  a  matrix  A  in  the  block  form  (4.1)  satisfying  the  assumptions  of  Section  4.1. 

•  Step  (1)  Construct  for  O i  and  02  factorizations  of  the  form  (2.13): 

Oi«QiS!  (5.1) 

02  «  Q2S2.  (5.2) 

•  Step  (2)  Compute  and  store  D^1,  D^1  ■ 

•  Step  (3)  Apply  D f1  and  D J1  to  Q i  and  Q2  respectively,  obtaining  U\  ~  D^1Q i  and  U2  =  D^Qi- 

•  Step  (4)  Compute  C  —  (/  +  V',U)~X ,  where  U  is  as  in  (4.7)  and  V*  is  as  in  (4.8). 

•  Step  (5)  Store  U\,  U2,  Si,  S2,  and  C. 

Using  the  randomized  procedure  described  in  Section  2.3,  computing  column  skeletons  for  O i  and 
02  requires  0(mn  log  l  +  Ink)  floating  point  operations,  where  l  is  chosen  to  be  k  +  20.  After  that,  the 
factorizations  (5.1)  and  (5.2)  can  be  computed  in  0{k2{n  +  m ))  operations  (see  Remark  2.1).  Thus, 
Step  (1)  requires  0(mn log  l  +  Ink  +  k2(n  +  m))  operations.  Step  (2)  requires  0{n3  +  m3)  operations. 
Step  (3)  requires  0(nmk )  operations.  In  Step  (4),  forming  ( 1  +  V*U )  requires  0(k2(n  +  m))  operations, 
while  inverting  it  requires  an  additional  0(k3)  operations.  Since  k  <  min(m,n),  the  total  cost  is 

T  ~  mnlogl  +  Ink  +  k2(m  +  n)  +  mnk  +  n3  +  m3.  (5.3) 


5.2  Multi-level  compression 

In  this  subsection  we  give  a  detailed  description  of  the  recursive,  multi-level  compression  algorithm.  We 
suppose  that  the  input  matrix  A  is  represented  via  a  multi-level  block  structure  consisting  of  n  levels. 
In  particular,  there  are  2r  diagonal  and  2r  off-diagonal  blocks  belonging  to  level  r,  for  r  =  1, . . .  ,n.  For 
example,  if  A  has  a  two-level  block  structure  represented  by  (4.12),  (4.13),  and  (4.14),  then  the  diagonal 
blocks  of  A  belonging  to  level  r  =  1  are  Di  j  and  Z?i.2,  and  the  off-diagonal  blocks  belonging  to  level 
r  —  2  are  02,i> 02,2,02. 3,  and  02,4-  We  assume  that  the  diagonal  blocks  on  all  levels  are  invertible. 

For  convenience,  in  the  algorithm  below  we  will  adopt  the  following  notation.  Let  B  denote  the 
input  matrix  A  itself  or  a  diagonal  block  on  level  r,  where  r  =  1, . . . ,  n  —  1.  Then  the  hierarchical  block 
structure  of  A  imposes  on  B  the  following  2x2  block  structure: 


D x  Oj 

O2  D2 


(5.4) 


We  define  D±  and  D2  to  be  the  left  and  right  children  of  B,  B  to  be  the  parent  of  D j  and  D2l  and  D\ 
(D2)  to  be  the  left  (right)  sibling  of  D2  (L>i). 

The  following  gives  a  description  of  the  algorithm. 


1.  Computation  of  all  column  skeletons 

For  each  level  r  =  1  apply  the  pivoted  Gram-Schmidt  procedure  to  all  off-diagonal  blocks, 

storing  a  column  skeleton  for  each  off-diagonal  block  in  terms  of  column  indices. 

2.  Compression  step 

•  Step  (0)  Initialize  the  current  block  B  to  be  the  input  matrix  A.  Set  level  =  0.  Go  to  Step  (1). 
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•  Step  (1) 

(i)  If  this  is  the  first  pass  of  the  current  block  B  to  Step  (1),  go  to  Step  (2). 

(ii)  If  this  is  the  second  pass  of  B  to  Step  (1),  go  to  Step  (3). 

•  Step  (2) 

(i)  If  level  =  n  —  1,  invert  the  left  and  right  children  Di  and  D2  of  B.  Store  Dl  1  and  D2  1 . 
Go  back  to  Step  (1). 

(ii)  If  level  <  n  —  1,  update  the  current  block  B  to  be  its  left-child.  Update  level  :=  level  +  1. 
Go  back  to  Step  (1). 

•  Step  (3)  [Comment:  See  representation  (5.4)  of  the  current  block  £.] 

(i)  Construct  for  O i  and  02  factorizations  of  the  form  (2.13): 

0\  «  Q\S\  (5-5) 

02  «  Q2S2.  (5.6) 

(ii)  Apply  the  inverses  of  the  diagonal  subblocks  D\,  D2  of  B  to  Qi,  Q2  respectively,  obtaining 
f/i  =  D~lQl  and  U2  =  D2lQ2. 

(iii)  Compute  the  matrix  C  =  (/  +  V*U)~l,  where  U  is  as  in  (4.7),  and  V *  is  as  in  (4.8). 

(iv)  Store  U\,U2,Si,S2,  and  C. 

(v)  If  level  =  0,  we  are  done.  Otherwise,  go  to  Step  (4). 

•  Step  (4) 

(i)  If  B  is  a  left-child  of  its  parent,  update  B  to  be  its  right-sibling.  Go  back  to  Step  (1). 

(ii)  If  B  is  a  right-child  of  its  parent,  update  B  to  be  its  parent.  Update  level  :=  level  —  1. 

Go  back  to  Step  (1). 

Remark  5.1  In  effect,  for  each  diagonal  block  B  of  A  on  level  r,  where  r  =  1,. . .  ,n  —  1,  the  algorithm 
computes  and  stores  the  quantities  pertaining  to  a  one-level  compression  of  B  (see  Step  (3)).  The 
algorithm  does  it  in  a  recursive  manner  such  that  at  the  time  the  inverse  of  a  diagonal  block  needs  to 
be  applied  (see  Step  (3)(ii)),  the  relevant  quantities  needed  for  its  rapid  application  have  already  been 
constructed. 

Remark  5.2  Only  the  inverses  of  the  lowest-level  diagonal  blocks  are  explicitly  computed  and  stored  by 
the  algorithm  (see  Step  (2)(i)).  The  end  result  is  a  hierarchical  list  of  quantities  that  allows  the  rapid 
application  of  the  inverse  of  A  in  a  recursive  manner. 


5.2.1  Computational  cost 

The  remainder  of  this  subsection  assesses  the  efficiency  of  the  multi-level  compression  algorithm.  Let  N 
be  the  size  of  the  matrix  A,  and  r  =  1, . . . ,  R  be  the  index  for  the  levels,  so  that  the  numbers  of  diagonal 
and  off-diagonal  blocks  that  belong  to  level  r  are  both  equal  to  pr  =  2T.  We  let  nT  denote  the  average 
block  size  (for  both  diagonal  and  off-diagonal  blocks)  on  level  r,  so  that 


nr 


n  i 

2r-i  > 


(5.7) 
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and  let  kr  denote  the  average  rank  of  an  off-diagonal  block  on  level  r. 

1.  Computation  of  all  column  skeletons 

First,  we  estimate  the  cost  of  computing  column  skeletons  for  all  off-diagonal  blocks.  For  each  off- 
diagonal  block  on  level  r,  where  r  —  1, . . . ,  R,  we  apply  the  randomized  algorithm  described  in  Section  2.3 
to  compute  its  column  skeleton.  Each  block  involves  a  cost  of 

t  ~  nj!  log  lr  +  nrlrkr,  (5.8) 

where  we  choose  lr  =  kr  +  20.  For  matrices  with  rank-deficient  off-diagonal  blocks,  we  can  assume  that 

kr  <  c.y/n7,  (5.9) 

log  lr  <  c',  (5.10) 

where  c  and  c'  are  constants  independent  of  nr  and  N.  So  t  in  (5.8)  is  dominated  by 

t  ~  n^.  (5.11) 

The  cost  for  all  off-diagonal  blocks  on  level  r  is  then 

tr  ~  prTil  ~  Nnr,  (5-12) 

where  we  have  used  the  fact  that  prnr  =  N.  The  total  cost  for  all  R  levels  is  thus 

rr~£tr~Arf>r~JV£^.  (5.13) 

t—\  r=l  r=l 

So  we  obtain 

Tx  ~  N 2.  (5.14) 

2.  Compression  step 

We  now  estimate  the  cost  of  the  main  part  of  the  compression  algorithm.  The  cost  of  applying 
a  one-level  compression  to  a  diagonal  block  on  level  R  —  1  follows  the  analysis  of  Section  5.1.  The 
only  difference  is  that  here  a  column  skeleton  is  already  computed,  so  Step  (1)  requires  only  0{k2RnR) 
operations.  The  cost  for  compressing  one  diagonal  block  on  level  R  —  1  is  thus 

t  ~  k2RnR  +  n2RkR  +  ~  nJR,  (5.15) 

and  the  total  cost  for  the  level  is 

tR-i  ~  ~  pRn3R~  Nn\,  (5.16) 

where  we  have  used  the  fact  that  pr  =  2pr_i  and  prUr  =  N. 

The  cost  of  compressing  each  diagonal  block  on  level  r,  where  r  =  1,...,/?  —  2,  comes  entirely 
from  Step  (3)  of  the  algorithm.  Step  3(i)  takes  0(k^+1nT+ 1)  operations  (see  Remark  2.1),  Step  3(ii) 
takes  0(kr+inr+\  log(nr+i))  operations  (see  Remark  4.4),  while  Step  3(iii)  takes  0(k^+1nr+i  +  kr+l) 
operations  (see  Section  5.1).  Thus,  the  cost  for  each  block  is 

t'  ~  fc^+1nr+i  +  kr+iUr+i  log(nr+]),  (5.17) 
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and  the  compression  cost  for  level  r  is 


tr  =  prt' 

Pr{.kr+  l^r+l  "f~  kr+l^T+l  log(nr+i)) 

-  pr+1{k‘$+1nr+1  +  kr+1nr+].\og(nr+i))  (5.18) 

-  N(k2+l  +  kr+i  log(nr+i)). 

Combining  the  estimates  (5.18)  for  r  =  1, . . . ,  R  —  2  with  the  estimate  (5.16),  we  obtain  the  total  cost 
for  the  compression  part  of  the  algorithm: 

R- 2 

T2  ~  nJ2( kr+i  +  *r+i  log(nr+i))  +  Nn2R.  (5.19) 

r=l 

In  practice,  we  choose  the  number  of  levels  R  to  be  of  the  order  log  N,  so  that  nR  will  be  a  small  fixed 
number. 

Combining  (5.7)  with  Remark  4.3,  we  obtain  the  estimate 

kr  <  c"log(ni)7r~1,  (5.20) 

where  c"  and  7  are  constants  independent  of  r,  n\,  and  N.  In  particular,  the  estimate  (5.20)  is  valid  for 
some  7  <  0.95. 

Now,  combining  (5.20)  with  (5.19),  we  have 


R- 2 

N  X^r+l  +  kr+l  l0g(nr+1)) 

R- 2 

~  N  ^2  (l°g2(Tll)(72)r  +  I°gnl7rl°g(nr+])J 

T*  — ~  J 

(5.21) 

r—  1 

R—2 

~  AHog2  N  ^  (72)1" 
r  —  *| 

(5.22) 

r —  i 

~  c"'N  log2  N, 

(5.23) 

where  d"  =  (1  -  72)-1.  Thus  T2  has  an  asymptotic  complexity  of 

T2~N  log2  N. 

(5.24) 

Combining  (5.14)  and  (5.24),  we  have 

T  ~  iV2, 

(5.25) 

where  T  denotes  the  cost  of  the  entire  algorithm. 


6  Application  of  the  solver  to  boundary  integral  equations 

In  Section  5,  we  presented  a  generic  algorithm  that  depends  only  on  the  ranks  of  off-diagonal  blocks  of  the 
matrix  to  be  inverted.  When  applied  to  a  dense  n  x  n  matrix  A  with  rank-deficient  off-diagonal  blocks, 
the  algorithm  typically  requires  0(n2)  operations.  In  the  case  that  A  is  a  discretization  of  a  boundary 
integral  operator,  we  can  accelerate  the  algorithm  by  utilizing  the  geometry  of  the  region  on  which  the 
equation  is  to  be  solved,  reducing  the  total  cost  to  0(nlog2n).  Section  6.1  introduces  a  technique  for 
computing  column  skeletons  for  off-diagonal  blocks  that  is  feist er  than  the  generic  technique  in  Section  5, 
and  Section  6.2  describes  an  adaptive  rotation  scheme  that  seeks  to  minimize  ranks  of  off-diagonal  blocks 
by  rearranging  the  parameterization  of  the  boundary. 
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6.1  An  accelerated  procedure  for  computing  column  skeletons 

The  bulk  of  the  computational  cost  of  the  algorithm  presented  in  Section  5.2  lies  in  the  computation  of 
column  skeletons  for  the  factorization  of  off-diagonal  blocks.  When  the  matrix  under  consideration  is  a 
discretization  of  a  boundary  integral  operator,  we  can  exploit  the  geometry  of  the  underlying  contour  and 
compute  column  skeletons  for  the  off-diagonal  blocks  in  a  hierarchical  manner,  reducing  computational 
complexity  of  the  process.  In  this  subsection,  we  describe  a  single-block  compression  technique,  whose 
details  can  be  found  in  [19],  and  then  utilize  it  in  a  hierarchical  scheme  for  the  construction  of  column 
skeletons  of  all  off-diagonal  blocks. 

Consider  a  smooth  contour  T  =  Ti  U  F2  as  in  Fig.  1  (a),  and  let  the  matrix  A  in  the  block  form 


^1,1  ^1,2 

A2,i  ^2,2 


(6.1) 


be  the  discretization  of  the  boundary  integral  operator 

\cr(x)  +  J  K(x,y)o(y)dS(y),  x€T,  (6.2) 

so  that,  for  example,  Aji2  represents  the  potential  on  Fi  generated  by  a  charge 
K  is  the  kernel  of  a  single  or  double  layer  potential  for  Laplace’s  equation. 

Let  rdrr.  be  a  circular  contour  surrounding  r2,  and  let  rexf.  be  the  part  of 
Fig.  1  (b)).  For  any  x  6  rext  and  y  €  r2, 

K{x,y)  =  [  G(x,z)K{z,y)dS(z),  (6-3) 

J  rcirc 

where  G  is  the  Green’s  function  of  Laplace’s  equation  on  the  contour  rci,c.  By  virtue  of  (6.3),  we  can 
interpolate  the  values  of  the  potential  field  generated  by  T2  on  Fcxt  from  the  values  of  the  field  generated 
by  r2  on  rdrt..  This  observation  allows  us  to  compute  a  column  skeleton  for  the  block  Ai.2  via  an  entirely 
local  operation:  instead  of  compressing  the  interaction  between  F2  and  Fi,  it  suffices  to  compress  the 
interaction  between  F2  and  f,  where  f  is  the  contour  formed  by  the  union  of  Fcirc  and  the  part  of  Fj 
that  is  inside  rcirc.  If  T2  is  discretized  using  n  nodes,  then  typically  f  can  be  discretized  using  O(n) 
nodes.  A  column  skeleton  for  Ai,2  can  thus  be  computed  in  0(n2k )  operations,  where  k  is  the  numerical 
rank  of  Ai]2.  A  more  detailed  discussion  can  be  found  in  [19]. 

Remark  6.1  The  method  described  above  can  be  applied  to  any  partial  differential  equation  for  which 
Green’s  identities  hold,  and  the  Green’s  function  does  not  have  to  be  known  explicitly.  In  particular,  the 
method  also  works  for  the  solution  of  boundary  integral  equations  associated  with  the  Helmholtz  equation 
and  Maxwell’s  equations. 

The  remainder  of  this  subsection  describes,  via  an  example,  the  recursive  application  of  the  single¬ 
block  compression  technique  for  the  determination  of  the  column  skeletons  of  all  off-diagonal  blocks.  Let 
A  be  a  discretization  of  the  integral  operator  (6.2)  that  is  represented  via  a  two-level  block  structure 
(4.12).  Each  off-diagonal  block  in  (4.12)  corresponds  to  charges  on  a  segment  of  T,  and  these  segments 
form  a  two-level  partitioning  of  T.  Suppose  for  example  that  the  blocks  02,3>  02,4,  Oi,i,  and  Oi,2  in  (4.12) 
correspond  to  charges  on  the  segments  r2j3,  r2;4 ,  Fi  i,  and  li  .2  respectively  such  that,  in  particular, 
Ti  1  =  r2i3  U  r2,4.  Using  the  single-block  technique  described  above,  we  compress  the  interaction  of  r2:3 
with  the  rest  of  the  contour,  obtaining  a  column  skeleton  for  the  block  02j3  in  terms  of  column  indices 
J4  =  [*! , . . . ,  ik\.  Similarly,  we  compute  a  set  of  column  indices  J2  =  [ji ,  -  ■  •  ,ji]  for  02,4- 

Since  r2j3  and  T2i4  partition  T^i,  we  can  obtain  a  column  skeleton  for  the  block  Oi,i  by  downsampling 
from  the  column  skeletons  already  computed  for  02,3  and  02j4.  In  other  words,  J\  and  J2  correspond 


distribution  on  F2,  and 
T  j  outside  of  rcirc  (see 
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Figure  1:  A  contour  T.  In  Fig.  (a),  the  partitioning  T  =  Tj  U  T2  is  shown  with  T2  drawn  with  a  bold 
line.  In  Fig.  (b),  rcirc  is  the  circular  contour  and  rext  is  the  part  drawn  with  a  dashed  line. 


to  two  clusters  of  charges  on  r^i,  and  the  union  of  these  charges  belongs  to  a  (possibly  discontinuous) 
segment  rpr(i  C  r^i  so  that,  to  compute  a  column  skeleton  for  Oi  i,  it  suffices  to  compress  the  interaction 
between  rpre  and  F 12. 

The  same  idea  can  be  applied  hierarchically  when  A  has  a  multi-level  block  structure:  due  to  the 
underlying  geometry,  the  column  skeletons  computed  for  off-diagonal  blocks  at  one  level  can  be  combined 
to  form  pre-computed  column  skeletons  for  blocks  on  the  upper  level,  thus  reducing  computational  cost 
to  0(N\og  N),  where  N  is  the  total  number  of  discretization  nodes  (see  Section  6.1.1  below). 

6.1.1  Computational  cost 

In  this  subsection,  we  estimate  the  computational  cost  of  the  multi-level  compression  algorithm  described 
in  Section  5.2  in  the  context  of  boundary  integral  equations,  in  which  the  accelerated  technique  in 
Section  6.1  can  be  used. 

We  start  with  observing  that  the  technique  only  accelerates  the  procedure  of  computing  column 
skeletons  for  off-diagonal  blocks,  so  the  estimate  (5.24)  for  the  cost  T2  of  the  main  compression  part 
remains  the  same.  In  particular,  we  will  show  that  the  cost  T\  of  computing  all  column  skeletons  is  now 
O(NlogN). 

We  will  follow  the  notation  in  Section  5.2.  In  addition,  let  k'r  denote  the  average  rank  of  interaction 
between  a  cluster  on  level  r  with  the  rest  of  the  boundary.  For  all  practical  purposes,  we  have 

kT  <  K  <  ckr,  (6.4) 

where  c  is  a  constant  independent  of  r  and  N .  So,  for  simplicity  we  will  use  kT  in  the  following  calculation. 

We  first  apply  the  pivoted  Gram-Schmidt  algorithm  to  compress  the  interaction  of  each  cluster  on 
level  R  with  the  rest  of  the  world.  By  the  technique  of  Section  6.1,  we  are  on  average  compressing  blocks 
of  dimension  ur  by  tir.  Combining  it  with  the  randomized  algorithm  of  Section  2.3  and  following  the 
analysis  in  Section  5.2.1,  the  total  cost  for  level  R  is 

ffl  ~PR(^filog/R  +nRlRkR)  ~  Nur,  (6.5) 


where  we  choose  Ir  =  fc/j  +  20. 

On  levels  r  =  1, . . . ,  R  —  1,  we  downsample  from  the  column  skeletons  computed  on  the  previous 
level,  so  for  each  cluster  on  level  r  we  only  need  to  apply  the  pivoted  Gram-Schmidt  procedure  to  a 
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block  of  2kr+i  columns  and  nr  rows.  Combining  it  with  the  randomized  algorithm  of  Section  2.3, 

tr  pr  ((2kr+  l  log  If  "h  (2kr+\')lrkr'j 

<  pr  (2 kTnr  log  lr  +  2 k^lr) 

~  prkrnr  (6-6) 

~  Nkr, 

where  tr  is  the  total  cost  for  level  r,  and  lr  =  kT  +  20;  above  we  have  used  the  assumption  kr+\  <  kr 
and  the  bounds  (5.9)  and  (5.10). 

Combining  the  estimates  (6.6)  for  r  =  1, . . . ,  R  —  1  with  the  estimate  (6.5),  the  same  analysis  as  in 
Section  5.2.1  shows  that 

Ti  ~  N  log  N,  (6.7) 

where  T)  is  the  cost  of  computing  column  skeletons  for  all  off-diagonal  blocks.  Combining  (6.7)  with 
(5.24),  we  have 

T  ~  N  log2  N,  (6.8) 

where  T  denotes  the  cost  of  the  entire  algorithm. 

6.2  Adaptive  rotation  scheme 

Typically,  matrices  that  arise  from  the  discretization  of  boundary  integral  equations  of  potential  theory 
have  rank-deficient  off-diagonal  blocks.  This  is  the  case  when  the  corresponding  segments  of  the  boundary 
are  “well-separated”  so  that  interacting  clusters  of  nodes  are  sufficiently  far  from  each  other.  When  the 
boundary  is  specified  via  a  parameterization  such  that  the  separation  between  two  points  is  not  well 
predicted  by  their  corresponding  distance  in  the  parameter  space,  there  may  exist  interacting  segments 
that  are  close  to  each  other,  leading  to  off-diagonal  blocks  that  have  high  ranks.  The  dumbbell-shaped 
contour  F  in  Fig.  2  is  an  example  that  needs  to  be  treated  with  care  in  order  to  avoid  such  a  problem. 
In  this  subsection,  we  introduce  a  scheme  that  adaptively  arranges  the  partitioning  of  a  given  contour 


Figure  2:  The  dumbbell-shaped  contour  T. 

so  that  clusters  of  nodes  interacting  with  each  other  are  well-separated  in  the  plane. 

Consider  the  contour  T  in  Fig.  2  and  the  matrix  A  formed  by  discretizing  the  boundary  integral 
operator  (6.2)  according  to  some  given  parameterization  of  T.  The  scheme  proceeds  in  a  hierarchical 
manner.  On  the  coarsest  (first)  level,  a  “center”  of  T  is  chosen  (the  center  of  mass  is  an  acceptable 
choice).  Then,  an  angle  6  is  chosen  randomly  and  T  is  partitioned  into  two  segments  by  a  line  with 
inclination  6  passing  through  the  chosen  center  (see  Fig.  3  (a)).  This  divides  the  discretization  nodes  on 
T  into  two  clusters,  and  the  rank  of  interaction  k  between  the  clusters  is  estimated.  If  the  computed  k 
is  less  than  a  pre-set  threshold,  the  partition  is  accepted;  otherwise,  another  angle  9  is  chosen  and  the 
above  process  is  repeated. 

Once  a  partition  T  =  Ti  U  T2  is  found  on  the  coarsest  level,  the  same  procedure  is  applied  to  Tj  and 
r 2  respectively  to  obtain  partitions  for  the  second  level.  We  then  recursively  apply  the  procedure  to 
obtain  “optimal”  partitions  of  the  contour  for  all  levels.  Finally,  we  rearrange  the  rows  and  columns  of 
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r2 


(a) 


(b)  (c) 

Figure  3:  Fig.  (a)  shows,  on  the  first  level,  the  partition  T  =  Ti  U  T2  obtained  by  the  adaptive  rotation 
scheme  for  the  contour  F  in  Fig.  2,  indicated  by  a  dashed  line.  The  chosen  center  of  T  is  indicated 
with  an  arrow.  On  the  second  level,  partitions  obtained  for  the  segments  rx  and  T2  are  shown  in  Fig. 
(b)  and  (c)  respectively.  In  general,  on  each  level  a  contour  is  split  by  lines  into  segments  so  that 
interacting  clusters  are  well-separated  from  each  other,  resulting  in  rank-deficient  off-diagonal  blocks  in 
the  discretized  operator. 


r2  \ 


the  matrix  A  according  to  the  partitioning.  Fig.  3  shows,  on  the  first  two  levels,  the  partitions  that  are 
typically  obtained  by  the  scheme  for  T. 

The  only  expensive  part  of  the  scheme  lies  in  computing  the  interaction  rank  between  two  clusters, 
and  if  it  is  done  via  the  randomized  algorithm  of  Section  2.3  the  scheme  involves  a  cost  of  0(n2),  where  n 
is  the  number  of  nodes  on  the  contour.  However,  the  same  principle  based  on  Green’s  theorem  described 
in  Section  6.1  can  be  used  to  speed  up  the  procedure.  Suppose,  for  example,  we  wish  to  compute  the 
rank  of  an  m  x  A:  matrix  B  corresponding  to  the  potential  generated  by  Ti  on  T2,  as  arranged  in  Fig.  3(a). 
Consider  a  segment  r0  that  belongs  to  Ti  and  is  inside  of  a  box,  as  indicated  in  the  figure.  Since  T0  is 
sufficiently  separated  from  the  dashed  line,  we  can,  by  virtue  of  Green’s  theorem,  replace  the  charges  on 
r0  by  some  fixed  small  number  of  artificial  charges  placed  on  the  boundary  of  the  box,  and  compress 
the  interaction  between  the  charges  on  the  box  and  the  nodes  on  T2.  This  can  be  done  systematically 
on  Ti  by  applying  the  method  to  those  boxes  in  the  structure  that  are  well-separated  from  the  line.  As 
a  result,  we  obtain  an  m  x  l  matrix  C  whose  column  dimension  l  is  typically  much  less  than  that  of  B, 
and  whose  rank  gives  a  good  approximation  to  the  rank  of  B,  provided  that  the  artificial  charges  on 
the  boxes  are  suitably  chosen  (see,  for  example,  [19]).  This  approach  reduces  the  column  dimension  of 
a  block  whose  rank  has  to  be  computed.  In  the  experimental  results  in  Section  7,  we  will  see  that  for 
typical  contours  the  scheme  involves  a  cost  of  O(n),  where  n  is  the  number  of  nodes  on  the  contour,  and 
the  time  taken  by  the  scheme  never  constitutes  more  than  8  percent  of  the  total  solution  time. 

7  Numerical  results 

In  this  section,  we  present  the  results  of  a  number  of  numerical  experiments  performed  to  assess  the 
efficiency  of  the  schemes  described  in  Sections  5  and  6. 

In  each  of  the  experiments,  we  apply  Nystrom  discretization  to  one  of  the  following  boundary  integral 
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equations: 


r{p)+hIra{y)^og[p-y]dS{y) 

-\°{p)+k  i G[y)  ( J- log  iP  ~ p|  + 0 dS{y) 

r(p)+i/<T(2/)^logb_2/|d5(2/) 
~rip)  +  hjv  a{y)  (£; log  |p  ~ 2/1  +  0 dS{y) 


f(p) 

(7.1) 

I  ip) 

(7.2) 

f(p) 

(7.3) 

Up), 

(7.4) 

and  solve  the  resulting  linear  systems. 

The  kernels  in  equations  (7.1)-(7.4)  are  smooth  over  smooth  curve  segments.  Since  the  contours 
considered  in  the  first  three  experiments  are  smooth,  in  these  cases  we  discretize  the  equations  using 
piecewise  Gaussian  quadrature  on  an  equispaced  mesh.  On  the  other  hand,  the  contour  in  the  last 
experiment  contains  a  corner  point,  over  which  the  kernel  is  singular.  In  this  case,  special  treatment  is 
needed  and  we  discretize  the  equation  near  the  corner  using  piecewise  Gaussian  quadrature  on  a  simply 
graded  mesh,  the  details  of  which  are.  described  in  Section  7.4  below. 

We  compare  three  methods  for  the  solution  of  the  resulting  linear  systems: 

•  Method  1.  Using  the  multi-level  solver  combined  with  the  adaptive  rotation  scheme. 

•  Method  2.  Using  the  multi-level  solver,  without  using  the  adaptive  rotation  scheme. 

•  Method  3.  Using  a  “brute  force”  QR  solver  (with  asymptotic  complexity  0(n3))  to  invert  the 
linear  system. 

In  the  case  of  the  Dirichlet  problem,  the  right-hand  side  /  is  a  potential  field  generated  by  a  collection 
of  randomly  placed  charges,  and  in  the  case  of  the  Neumann  problem,  /  is  the  normal  derivative  of  the 
potential  field  generated  by  such  a  collection  of  charges.  In  each  experiment  the  layer  potential  generated 
by  the  computed  charge  distribution  a  was  evaluated  at  a  collection  of  40  randomly  placed  points.  The 
values  obtained  were  then  compared  with  the  exact  potential. 

The  solvers  were  implemented  in  Fortran  77  and  compiled  with  the  Lahey/Fujitsu  Linux64  Fortran 
Compiler  Release  8.10a.  All  experiments  were  run  on  a  PC  with  an  Intel  Core  i7  2.67  GHz  processor 
and  12GB  of  memory.  No  attempt  was  made  to  parallelize  any  of  the  code.  The  following  notation  is 
used  when  presenting  the  numerical  results: 


R  the  number  of  levels  in  the  multi-level  solver 

N  the  number  of  discretization  nodes  used 

frot  the  CPU  time  taken  by  the  adaptive  rotation  scheme  in  Method  1 

t solve, l  total  CPU  time  taken  in  solving  for  o  by  Method  1  (which  includes  frot) 

fsoive,2  total  CPU  time  taken  in  solving  for  a  by  Method  2 

tsoive,3  total  CPU  time  taken  in  solving  for  a  by  Method  3 

.Fj-ei,!  the  relative  Z2-norm  error  obtained  by  Method  1 

f?rei,2  the  relative  Z2-norm  error  obtained  by  Method  2 

Ete i>3  the  relative  Z2-norm  error  obtained  by  Method  3 

By  the  relative  I2-norm  error  we  mean  the  quantity  ||v  —  vd^/IMhi  where  j  denotes  the  exact 

potential  field  at  the  40  random  points  and  {i^}^  denotes  the  potential  field  given  by  the  computed 
<t.  All  timings  presented  are  in  seconds  of  CPU  time. 
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Figure  4:  A  rippled  contour  with  a  thin  handle. 


7.1  Example:  a  rippled  contour  with  a  thin  handle 

In  this  subsection,  we  present  results  for  the  rippled  contour  with  a  thin  handle  shown  in  Fig.  4.  The 
contour  was  discretized  using  between  200  and  51200  nodes  and  the  integral  equation  (7.2)  associated 
with  the  exterior  Dirichlet  problem  was  solved.  Table  1  presents  the  results. 

In  this  example,  we  see  that,  roughly  speaking,  the  timings  tsoive,x  and  fSolve,2  taken  by  the  multi-level 
solver  scale  slightly  more  than  linearly  with  the  number  of  discretization  nodes  N.  This  agrees  with 
the  estimate  (6.8)  of  its  performance  in  Section  6.1.1.  As  shown  in  Fig.  5  (a),  the  contour  is  originally 
parameterized  in  such  a  way  that  its  long,  narrow  “handle”  part  causes  the  pair  of  clusters  on  the 
coarsest  (first)  level  to  have  high  rank  of  interactions.  Fig.  5  (b)  shows  the  clusters  as  rearranged  by  the 
adaptive  rotation  scheme.  We  observe  that  the  scheme  reduces  the  CPU  times  roughly  by  the  factor  of 
3. 

Finally,  we  compare  the  multi-level  solver  with  the  “brute  force”  QR  solver  that  takes  0{n3)  op¬ 
erations.  We  observe  that  for  small-scale  problems,  the  “brute  force”  approach  is  more  efficient  (as 
expected).  The  observed  break-even  point  is  about  n  =  1100,  after  which  the  solver  of  this  paper  is 
more  efficient.  At  n  =  3200,  the  solver  of  this  paper  (combined  with  the  rotation  scheme)  performs 
about  7.5  times  faster  than  the  “brute  force”  scheme.  Needless  to  say,  in  practice  one  would  always  use 
the  more  efficient  scheme  for  the  problem  to  be  solved. 

7.2  Example:  A  cross-shaped  contour 

In  this  subsection,  we  present  results  for  the  cross-shaped  contour  shown  in  Fig.  6.  The  contour  was 
discretized  using  between  200  and  51200  nodes  and  the  integral  equation  (7.4)  associated  with  the  interior 
Neumann  problem  was  solved.  Table  2  presents  the  results. 

Similar  to  the  contour  in  Section  7.1,  the  one  here  has  several  long  and  narrow  paxts  that  introduce 
high  ranks  in  the  off-diagonal  blocks  of  the  discretized  operator.  Fig.  7  shows,  in  particular,  that  in  the 
original  arrangement  of  the  clusters  interactions  on  the  first  two  levels  are  of  high  rank.  Fig.  8  shows 
the  clusters  as  rearranged  by  the  rotation  scheme,  and  the  reduced  ranks  of  interactions.  We  observe 
that  the  scheme  reduces  the  CPU  times  roughly  by  the  factor  of  4,  and  the  time  it  spent  constitutes  less 
than  8  percent  of  the  total  time. 


(a) 


(b) 


Figure  5:  (a)  The  original  arrangement  of  the  clusters  of  nodes  on  the  first  level  for  the  contour  shown 
in  Fig.  4.  (b)  The  clusters  on  the  first  level  as  rearranged  by  the  adaptive  rotation  scheme.  The  contour 
was  discretized  using  3200  nodes,  and  the  ranks  of  interactions  between  the  pairs  of  clusters  in  (a)  is 
about  285  and  in  (b)  is  about  50. 


Table  1:  Computational  results  for  the  boundary  integral  equation  (7.2)  associated  with  the  exterior 
Dirichlet  problem  on  the  contour  shown  in  Fig.  4. 


N 

R 

^rot 

^solvc.l 

■f'rcl,! 

^solvc,2 

•£^rcl,2 

£solvc,3 

■^rcl,3 

4 

2.13e  - 

03 

6.73e  -  02 

4.39e  -01 

3.99e  -  01 

3.87e  -  01 

400 

5 

6.92e  - 

03 

1.61e  —  01 

9.03e  -  03 

9.94e  -  03 

800 

6 

1.67e  - 

02 

4.52e  —  01 

5.33e  -  04 

4.82e  -  04 

5.41e  -  04 

1600 

7 

4.91e  - 

02 

1.26e  +  00 

1.26e  -  06 

1.27e  -  06 

3200 

8 

9.42e  - 

02 

2.63e  +  00 

3.99e  -  13 

5.37e  -  13 

1.96e  +  01 

5.32e  -  13 

6400 

9 

2.04e  - 

01 

5.73e  +  00 

3.64e  -  13 

1.69e  +  01 

6.86e  -  13 

4.72e-  14 

12800 

10 

3.86e  - 

01 

1.20e  +  01 

8.04e  -  13 

5.05e  -  13 

25600 

11 

7.16e  - 

01 

2.17e  +  01 

1.88e-  13 

- 

- 

51200 

12 

1.43e  +  00 

4.88e  +  01 

4.45e  -  13 

3.74e  -  13 

~ 

Table  2:  Computational  results  for  the  boundary  integral  equation  (7.4)  associated  with  the  interior 
Neumann  problem  on  the  contour  shown  in  Fig.  6. 


N 

R 

^rot 

^solve,l 

■£^rel,l 

£soIve,2 

■Sr  el,  2 

^solve,3 

EIe], 3 

WiliM 

4 

2.22e  -  03 

7.00e  -  02 

1.80e  — 

02 

1.19e  -  01 

1.62e  — 

row 

4.65e  -  03 

1.72e  -  02 

5 

7.30e  -  03 

1.92e  -  01 

2.69e  - 

04 

4.75e  -  01 

2.61e  — 

3.35e  -  02 

2.66e  -  04 

800 

6 

1.86e  -  02 

5.54e  -  01 

1.59e  - 

07 

1.74e  +  00 

1.62e  - 

2.56e  -  01 

1.62e  -  07 

1600 

7 

4.99e  -  02 

1.34e+  00 

6.32e  - 

12 

6.03e  +  00 

12 

2.31e  +  00 

5.87e  -  12 

3200 

8 

l.lle  —  01 

3.00c  ~f“  00 

1.21e  — 

12 

1.51e  +  01 

8.73e  - 

13 

1.96e  +  01 

1.16e  -  12 

6400 

9 

2.73e  -  01 

6.36e  +  00 

9.81e  — 

13 

2.44e  +  01 

2.63e  - 

13 

1.61e  +  02 

7.46e  -  13 

12800 

10 

5.63e  -  01 

1.25e  +  01 

3.25e  - 

13 

4.61e  +  01 

2.16e  — 

13 

1.38e  +  03 

1.63e  -  13 

25600 

11 

2.10e  +  00 

2.67e  +  01 

5.69e  - 

13 

9.44e  +  01 

13 

- 

- 

51200 

12 

3.37e  +  00 

6.13e  +  01 

2.83e  - 

13 

1.97e  +  02 

3.44e  - 

13 

- 

- 

22 


Figure  6:  A  cross-shaped  contour.  Its  arclength  is  about  40. 


(a) 


'  (b) 


’  (c)° 


Figure  7:  The  original  arrangement  of  the  clusters  of  nodes  on  the  contour  in  Fig.  6  on  (a):  the  first  level 
and  (b),(c):  the  second  level.  The  contour  was  discretized  using  3200  nodes.  The  rank  of  interactions 
in  (a)  is  about  270  and  the  ranks  of  interactions  in  (b)  and  (c)  are  both  about  150. 
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Figure  8:  The  clusters  on  the  contour  in  Fig.  6  on  (a):  the  first  level  and  (b),(c):  the  second  level,  as 
rearranged  by  the  adaptive  rotation  scheme.  The  rank  of  interactions  in  (a)  is  about  64  and  the  ranks 
of  interactions  in  (b)  and  (c)  are  both  about  45. 
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7.3  Example:  A  tank-shaped  contour 

In  this  subsection,  we  present  results  for  the  tank-shaped  contour  in  Fig.  9.  The  contour  was  discretized 
using  between  200  and  51200  nodes  and  the  integral  equation  (7.3)  associated  with  the  exterior  Neumann 
problem  was  solved.  Table  3  presents  the  results. 


1  23456789 


Figure  9:  A  tank-shaped  contour.  Its  arclength  is  about  23. 

We  observe  that  the  asymptotic  complexity  of  the  multi-lever  solver  remains  the  same  as  for  the 
contours  in  Sections  7.1  and  7.2,  but  the  times  tsoivc,2  involve  smaller  constants.  Here,  Fig.  10  (a)  shows 
the  original  arrangement  of  the  clusters  on  the  first  level  and  the  associated  rank  of  interactions,  and 
Fig.  10  (b)  shows  the  clusters  as  rearranged  by  the  rotation  scheme.  In  particular,  the  reduction  in  rank 
is  not  as  substantial  as  those  in  the  previous  examples,  and  the  reduction  in  CPU  times  is  relatively 
insignificant. 


1  23456789 


(b) 

Figure  10:  (a)  The  original  arrangement  of  the  clusters  of  nodes  on  the  first  level  for  the  contour  in 
Fig.  9.  (b)  The  clusters  as  rearranged  by  the  adaptive  rotation  scheme.  Each  cluster  contains  about 
1600  nodes,  and  the  ranks  of  interactions  between  the  pairs  of  clusters  in  (a)  is  about  120  and  in  (b)  is 
about  40. 

7.4  Example:  A  PacMan-shaped  contour 

In  this  subsection,  we  present  results  on  the  PacMan-shaped  contour  F  shown  in  Fig.  11,  which  contains 
a  single  corner  70  of  exterior  angle  0.6  radian.  The  boundary  integral  equation  (7.1)  associated  with  the 
interior  Dirichlet  problem  is  solved  on  F. 
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Table  3:  Computational  results  for  the  boundary  integral  equation  (7.3)  associated  with  the  exterior 
Neumann  problem  on  the  contour  shown  in  Fig.  9. 


N 

R 

^rot 

^solve,l 

£rel,l 

^solve,2 

■S'rel ,  2 

£solve,3 

-£^rel  ,3 

200 

4 

2.23e  -  03 

6.15e-  02 

1.19e  —  03 

6.99e  -  02 

1.29e  - 

03 

4.67e  -  03 

1.14e  -  03 

400 

5 

7.48e  -  03 

1.51e-  01 

1.60e  -05 

2.10e  -  01 

1.17e  — 

05 

3.35e  -  02 

1.15e  -  05 

800 

6 

1.92e  -  02 

4.26e  -  01 

2.28e  -  11 

5.70e  -  01 

2.61e  - 

11 

2.56e  -  01 

2.40e  —  11 

1600 

7 

5.37e  -  02 

9.10e  -  01 

1.03e-  12 

1.38e  +  00 

l.OOe  - 

12 

2.31e  +  00 

l.lle  —  12 

3200 

8 

1.07e  -  01 

1.91e  +  00 

1.37e-  13 

2.81e  +  00 

2.53e  — 

13 

1.92e  +  01 

1.20e  -  13 

6400 

9 

2.63e  —  01 

4.07e  4-  00 

4.23e  -  13 

5.67e  +  00 

4.09e  - 

13 

1.51e  +  02 

2.98’e  -  13 

12800 

10 

6.38e  -  01 

8.92e  +  00 

1.48e  —  13 

1.22e  +  01 

2.63e  - 

13 

1.38e  +  03 

1.70e  -  13 

25600 

11 

1.14e  +  00 

1.94e  +  01 

2.35e  -  14 

2.79e  +  01 

l.lOe  — 

13 

- 

- 

51200 

12 

2.42e  +  00 

4.92e  +  01 

7.70e  -  14 

6.44e  +  01 

8.29e  - 

14 

- 

Below,  we  describe  a  simple  expedient  we  used  to  obtain  a  high-accuracy  discretization  of  a  boundary 
with  a  corner.  The  discretization  of  the  boundary  integral  equation 


■y\dS{y)  =  f(p) 


(7.5) 


via  the  Nystrom  method  with  piecewise  Gaussian  quadrature  displays  high  rates  of  convergence,  so  long 
as  the  kernel  K(x,y)  and  the  layer  density  a  are  smooth.  However,  if  the  contour  T  contains  corner 
points,  then  the  kernel  K(x,y)  is  singular,  and  so  is  the  layer  density  o.  As  a  result,  piecewise  Gaussian 
quadrature  over  an  equispaced  mesh  will  not  be  accurate.  Standard  approaches  for  discretizing  equations 
of  the  form  (7.5)  near  a  corner  point  70  amount  to  using  a  dense  mesh  of  points  near  70  (see,  for  instance, 
[14,  18,  2]).  Following  the  terminology  of  [14],  we  call  a  subdivision  of  the  interval  [0, 1]  into  subintervals 
with  the  endpoints 

j  =  0,1,2,..., a,  (7.6) 


a  simply  graded  mesh.  In  this  experiment,  the  integral  equation  (7.5)  is  discretized  over  a  small  segment 
containing  70  by  first  mapping  [0,1]  onto  a  small  interval  on  each  side  of  70-  We  then  use  Gaussian 
quadrature  to  discretize  the  image  of  each  of  the  subintervals  comprising  the  simply  graded  mesh  on 
[0,1].  Note  that  the  resulting  discretization  omits  a  small  region  around  7o-  The  refinement  of  the 
simply  graded  mesh  around  70  is  controlled  by  adjusting  a  cut-off  £r„t  on  the  minimum  length  of  the 
subintervals  comprising  the  mesh;  in  other  words,  given  ecut,  we  choose  s  in  (7.6)  to  be  the  smallest 
integer  such  that 

^  <  ecut-  (7-7) 

The  part  of  T  away  from  70  is  discretized  via  an  equispaced  mesh.  The  same  number  of  Gaussian  nodes 
is  used  for  each  of  the  subintervals  in  the  mesh  discretizing  T.  Fig.  12  depicts  the  subintervals  that 
comprise  the  mesh  discretizing  T  when  ecut  is  set  at  10~8.  In  this  setting,  the  simply  graded  mesh 
around  the  corner  70  contains  about  1150  nodes.  The  proof  of  convergence  of  such  a  discretization  to 
the  solution  of  the  underlying  integral  equation  is  somewhat  involved,  and  can  be  found,  for  example, 
in  [14]. 

Table  4  presents  the  result  of  the  experiment.  We  observe  that  T  is  originally  parameterized  in  such  a 
way  that  discretization  nodes  to  each  side  of  70  belong  to  separate  clusters  (see  Fig.  13).  As  we  decrease 
ecut,  the  distribution  of  nodes  around  70  becomes  denser  and  the  rank  of  interactions  rtop  of  the  clusters 
increases.  In  particular,  the  0(N  log2  N)  estimate  (6.8)  on  the  cost  of  the  multi-level  solver  does  not 
apply  anymore;  indeed,  it  relies  on  the  assumption  (4.25)  that  the  ranks  of  the  off-diagonal  blocks  depend 
logarithmically  on  the  block  sizes,  and  this  assumption  is  not  valid  in  this  case  (see  Table  4). 

For  this  problem,  it  takes  ecut  =  10“ 15  to  attain  an  error  on  the  order  of  10  1 11 .  This  corresponds  to 
using  about  4370  nodes  in  total  to  discretize  T,  about  2150  of  which  used  for  the  simply  graded  mesh 
around  the  corner.  At  this  setting,  the  multi-level  solver,  combined  with  the  rotation  scheme,  performs 
about  13.5  times  faster  than  the  “brute  force”  scheme.  By  extrapolation,  we  expect  the  multi-level  solver 
to  gain  more  considerable  advantage  in  the  case  of  solving  (7.5)  on  domains  with  multiple  comers. 


Remark  7.1  The  error  in  Table  4  is  the  best  precision  achieved  using  a  simply  graded  mesh  around  the 
comer  in  double  precision  arithmetic.  To  get  a  higher  precision,  one  can  either  run  the  experiment  in 
extended  precision  arithmetic,  or  adopt  discretization  methods  other  than  the  simply  graded  mesh. 


8  Generalizations  and  conclusions 

We  have  presented  a  numerical  algorithm  for  the  construction  of  compressed  factorizations  of  inverses 
of  matrices  possessing  rank-deficient  off-diagonal  blocks.  When  applied  to  matrices  arising  from  the 
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Figure  12:  (a)  The  mesh  used  to  discretize  T  when  ecut  is  set  at  10~8.  There  are  in  total  136  subintervals 
and  about  3260  nodes,  (b)  A  close-up  of  the  simply  graded  mesh  around  the  corner,  contained  by  the 
dashed  box  in  (a).  The  mesh  consists  of  48  subintervals  and  about  1150  nodes. 


Figure  13:  The  arrangement  of  the  clusters  on  the  first  level  under  the  original  parameterization  of  T. 


Table  4:  Computational  results  for  the  boundary  integral  equation  (7.1)  associated  with  the  interior 
Dirichlet  problem  on  the  contour  T  shown  in  Fig.  11.  Here,  rtop  denotes  the  approximate  rank  of 
interactions  of  the  clusters  on  the  first  level  under  the  original  parameterization  of  T,  as  arranged  in 
Fig.  13. 


£cut 

N 

£rot 

£solve,l 

-^rel,l 

^solve,2 

-^rel,2 

T  top 

^solve,3 

-^rel,3 

2784 

9.07e  -  02 

1.32e  +  00 

4.86e  +  00 

7.66e  -  07 

192 

1.26e  +  01 

7.22e  -  05 

HI 

8.85e  -  02 

1.57e  +  00 

8.48e  -  06 

9.05e  +  00 

7.20e  -  06 

252 

1.72e  +  01 

7.24e  -  06 

l.Oe-  09 

mm 

9.58e  -  02 

5.92e  -  07 

2.10e  +  01 

5.54e  -  07 

ig|* 

2.27e  +  01 

4.96e  —  07 

l.Oe-  11 

3744 

9.74e  -  02 

2.32e  +  00 

3.60e  —  08 

3.96e  +  01 

3. lie  -  08 

3.15e  +  01 

3.39e  -  08 

l.Oe  -  13 

2.85e  +  00 

6.14e  +  01 

3.78e  -  09 

464 

3.74e  +  01 

3.40e  -  09 

l.Oe  -  15 

4368 

3.51e  +  00 

3.84e  -  10 

9.90e  +  01 

3.59e  -  10 

536 

4.76e  +  01 

3.30e  -  10 
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discretization  of  boundary  integral  equations  associated  with  the  solution  of  Laplace’s  equation  in  two 
dimensions,  the  algorithm  typically  has  a  cost  proportional  to  N  log2  N,  where  N  is  the  number  of  nodes 
in  the  discretization  of  the  boundary. 

Several  straightforward  generalizations  of  the  scheme  of  this  paper  suggest  themselves: 

1.  The  adaptive  rotation  scheme  of  this  paper  generalizes  to  three  dimensions,  where  it  addresses 
an  outstanding  problem  of  considerable  interest.  In  particular,  the  scheme  extends  the  applicability  of 
fast  direct  solvers  to  integral  operators  defined  on  boundary  surfaces,  which  are  usually  specified  via 
parameterizations  r  :  [0, 1]  x  [0, 1]  — >  R3  such  that  the  distance  between  two  points  (si,  ti)  and  (s2i  f 2)  in 
the  parameter  space  bears  no  relation  to  the  distance  |]r(si,  ti)  —  r(s2,f2)||  between  their  corresponding 
points  on  the  surface.  A  generalization  of  the  scheme  to  this  setting  will  allow  fast  direct  solvers  to  utilize 
the  geometry  of  the  boundary  that  is  usually  not  easily  extractable  from  a  given  parameterization. 

2.  The  scheme  of  this  paper  can  be  readily  extended  to  the  case  of  boundary  integral  equations 
associated  with  the  Helmholtz  equation: 

Au  +  lj2u  =  0,  (8-1) 

assuming  u  is  not  too  large.  This  work  is  in  progress  and  will  be  reported  at  a  later  date. 

3.  The  direct  solver  presented  in  this  paper  can  be  modified  to  compute  a  compressed  complete 
orthogonal  decomposition  of  the  input  matrix,  rather  than  a  compressed  factorization  of  the  inverse. 
Such  an  algorithm  will  have  applications  to  least-squares  solutions  of  rank-deficient  systems  of  equations: 

Ax  =  b ,  (8-2) 


where  A  is  an  n  by  n  matrix  of  rank  k  <  n. 
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